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N.B. This is the full version of the conference paper pub- 
lished as [12]. This version includes an Appendix with proofs 
and additional results, and corrects a few typographical er- 
rors discovered after publication. It also adds an improve- 
ment in the error bounds achieved under (e, 5)-differential 
privacy, included as Theorem 5. 

ABSTRACT 

Differential privacy is a robust privacy standard that has 
been successfully applied to a range of data analysis tasks. 
But despite much recent work, optimal strategies for answer- 
ing a collection of related queries are not known. 

We propose the matrix mechanism, a new algorithm for 
answering a workload of predicate counting queries. Given 
a workload, the mechanism requests answers to a different 
set of queries, called a query strategy, which are answered 
using the standard Laplace mechanism. Noisy answers to 
the workload queries are then derived from the noisy answers 
to the strategy queries. This two stage process can result in 
a more complex correlated noise distribution that preserves 
differential privacy but increases accuracy. 

We provide a formal analysis of the error of query answers 
produced by the mechanism and investigate the problem of 
computing the optimal query strategy in support of a given 
workload. We show this problem can be formulated as a 
rank-constrained semidefinite program. Finally, we analyze 
two seemingly distinct techniques, whose similar behavior is 
explained by viewing them as instances of the matrix mech- 
anism. 

Categories and Subject Descriptors: H.2.8 [Database 
Management]: Database Applications — Statistical databases; 
G.l [Numerical Analysis]: Optimization 

General Terms: Algorithms, Security, Theory 

Keywords: private data analysis, output perturbation, diff- 
erential privacy, semidefinite program. 



1. INTRODUCTION 

Differential privacy [8] offers participants in a dataset the 
compelling assurance that information released about the 
dataset is virtually indistinguishable whether or not their 
personal data is included. It protects against powerful ad- 
versaries and ofl^ers precise accuracy guarantees. As outlined 
in recent surveys [5, 6, 7], it has been applied successfully to 
a range of data analysis tasks and to the release of summary 
statistics such as contingency tables [1], histograms [11, 17], 
and order statistics [13]. 

Differential privacy is achieved by introducing randomness 
into query answers. The original algorithm for achieving diff- 
erential privacy, commonly called the Laplace mechanism [8] , 
returns the sum of the true answer and random noise drawn 
from a Laplace distribution. The scale of the distribution is 
determined by a property of the query called its sensitivity: 
roughly the maximum possible change to the query answer 
induced by the addition or removal of one tuple. Higher sen- 
sitivity queries are more revealing about individual tuples 
and must receive greater noise. 

If an analyst requires only the answer to a single query 
about the database, then the Laplace mechanism has re- 
cently been shown optimal in a strong sense [9]. But when 
multiple query answers are desired, an optimal mechanism 
is not known. 

At the heart of our investigation is the suboptimal be- 
havior of the Laplace mechanism when answers to a set of 
correlated queries are requested. We say two queries are cor- 
related if the change of a tuple in the underlying database can 
affect both answers. Asking correlated queries can lead to 
suboptimal results because correlation increases sensitivity 
and therefore the magnitude of the noise. The most extreme 
example is when two duplicate queries are submitted. The 
sensitivity of the pair of queries is twice that of an individual 
query. This means the magnitude of the noise added to each 
query is doubled, but combining the two noisy answers (in 
the natural way, by averaging) gives a less accurate result 
than if only one query had been asked. 

Correlated workloads arise naturally in practice. If mul- 
tiple users are interacting with a database, the server may 
require that they share a common privacy budget to avoid 
the threat of a privacy breach from collusion. Yet, in acting 
independently, they can easily issue redundant or correlated 
queries. Further, in some settings it is appealing to simul- 
taneously answer a large structured set of queries, (e.g. all 
range queries), which are inherently correlated. 

In this work we propose the matrix mechanism, an im- 
proved mechanism for answering a workload of predicate 



counting queries. Each query is a linear combination of base 
counts reporting the number of tuples with the given com- 
bination of attribute values. A set of such queries is repre- 
sented as a matrix in which each row contains the coefficients 
of a linear query. Histograms, sets of marginals, and data 
cubes can be viewed as workloads of linear counting queries. 

The matrix mechanism is built on top of the Laplace mech- 
anism. Given a workload of queries, the matrix mechanism 
asks a different set of queries, called a query strategy, and 
obtains noisy answers by invoking the Laplace mechanism. 
Noisy answers to the workload queries are then derived from 
the noisy answers to the strategy queries. There may be more 
than one way to estimate a workload query from the answers 
to the strategy queries. In this case the derived answer of 
the matrix mechanism combines the available evidence into 
a single consistent estimate that minimizes the variance of 
the noisy answer. 

While the Laplace mechanism always adds independent 
noise to each query in the workload, the noise of the matrix 
mechanism may consist of a complex linear combination of 
independent noise samples. Such correlated noise preserves 
differential privacy but can allow more accurate results, par- 
ticularly for workloads with correlated queries. 

The accuracy of the matrix mechanism depends on the 
query strategy chosen to instantiate it. This paper explores 
the problem of designing the optimal strategy for a given 
workload. To understand the optimization problem we first 
analyze the error of any query supported by a strategy. The 
error is determined by two essential features of the strategy: 
its error profile, a matrix which governs the distribution of 
error across queries, and its sensitivity, a scalar term that 
uniformly scales the error on all queries. Accurately answer- 
ing a workload of queries requires choosing a strategy with 
a good error profile (relatively low error for the queries in 
the workload) and low sensitivity. We show that natural 
strategies succeed at one, but not both, of these objectives. 

We then formalize the optimization problem of finding 
the strategy that minimizes the total error on a workload 
of queries as a scmi-dcfiiiitc program with rank constraints. 
Such problems can be solved with iterative algorithms, but 
we are not aware of results that bound the number of itera- 
tions until convergence. In addition, we propose two efficient 
approximations for deciding on a strategy, as well as a heuris- 
tic that can be used to improve an existing strategy. 

Lastly, our framework encompasses several techniques pro- 
posed in the literature. We use it to analyze two tech- 
niques [11, 17], each of which can be seen as an instance 
of the matrix mechanism designed to support the workload 
consisting of all range queries. Our analysis provides insight 
into the common behavior of these seemingly distinct tech- 
niques, and wc prove novel bounds on their error. 

After a background discussion we describe the matrix mech- 
anism in Section 3. We analyze its error formally in Section 4. 
In Section 5, we characterize the optimization problem of 
choosing a query strategy and propose approximations. We 
use our results to compare existing strategics in Section 6. 
We discuss related work, including other recent techniques 
that improve on the Laplace mechanism, in Section 7. 

2. BACKGROUND 

This section describes the domain and queries considered, 
and reviews the basic principles of differential privacy. We 
use standard terminology of linear algebra throughout the 



paper. Matrices and vectors arc indicated with bold letters 
(c.g A or x) and their elements are indicated as aij or Xi. 
For a matrix A, A* is its transpose, A"'^ is its inverse, and 
trace(A) is its trace (the sum of values on the main diagonal). 
We use diag{c\, . . . Cn) to indicate an n x n diagonal matrix 
with scalars Ci on the diagonal. Wc use 0™^" to indicate a 
matrix of zeroes with m rows and n columns. 

2.1 Linear queries 

The database is an instance / of relational schema J?(A), 
where A is a set of attributes. We denote by rfom(A) the 
cross-product of the domains of attributes in A. The analyst 

chooses a set of attributes B C A relevant to their task. 
For example if the analyst is interested in a subset of two 
dimensional range queries over attributes A-i and A'2, they 
would set B = {Ax,A2}. We then form a frequency vector x 
with one entry for each element of dom(B). For simplicity we 
assume dom(B) = {1, 2, . . . , n} and for each i G dom(B), Xi 
is the count of tuples equal to i in the projection 113(1). We 
represent x as a column vector of counts: ■x. = [xi . . . x„]*. 

A linear query computes a linear combination of the counts 
in X. 

Definition 2.1 (Linear query). A linear query is a 

length-n row vector' q = [qi . ■ . qn] with each qi G E. The 
answer to a linear query q on :k is the vector product qx = 
qixi -I h qnX„. 

We will consider sets of linear queries organized into the 
rows of a query matrix. 

Definition 2.2 (Query matrix). ^ query matrix is a 
collection of m linear queries, arranged by rows to form an 
my. n matrix. 

If Q is an m X n query matrix, the query answer for Q 
is a length m column vector of query results, which can be 
computed as the matrix product Qx. 

Example 1. Figure 1 shows three query matrices, which 
we use as running examples throughout the paper. I4 is the 
identity matrix of size four. This matrix consists of four 
queries, each asking for an individual element of x. H4 con- 
tains seven queries, which represent a binary hierarchy of 
sums: the first row is the sum over the entire domain (re- 
turning the total number of tuples in I ), the second and third 
rows each sum one half of the domain, and the last four rows 
return individual elements ofyi. Y4 is the matrix of the Haar 
wavelet. It can also be seen as a hierarchical set of queries: 
the first row is the total sum, the second row computes the 
difference between sums in two halves of the domain, and the 
last two rows return differences between smaller partitions of 
the domain. In Section 6 we study general forms of these 
matrices for domains of size n [11, 17]. 

2.2 The Laplace mechanism 

Because the true counts in x must be protected, only 
noisy answers to queries, satisfying differential privacy, are 
released. We refer to the noisy answer to a query as an 
estimate for the true query answer. The majority of our re- 
sults concern classical e-differential privacy, reviewed below. 
(We consider a relaxation of differential privacy briefly in 
Sec. 5.2.) 
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Figure 1: Query matrices with dorn — {1,2, 3, 4}. Each 
is full rank. I4 returns each unit count. H4 computes 
seven sums, hierarchically partitioning the domain. 
W4 is based on the Hasir wavelet. 



Informally, a randomized algorithm is differentially private 
if it produces statistically close outputs whether or not any 
one individual's record is present in the database. For any 
input database I, let nbrs{I) denote the set of neighboring 
databases, each differing from / by at most one record; i.e., 
if I' e nbrs{I), then |(7 - /') U (/' - 7)| = 1. 

Definition 2.3 (e-DiFFERENTiAL privacy) . A random- 
ized algorithm JC is e- differentially private if for any instance 
I, any I' G nbrs{I), and any subset of outputs S C Range{lC), 

the following holds: 

Pt[K(I) eS\< exp(e) x Pt[K.{I') £ S], 
where the probability is taken over the randomness of the K. 

Differential privacy can be achieved by adding random 
noise to query answers. The noise added is a function of the 
privacy parameter, e, and a property of the queries called 
sensitivity. The sensitivity of a query bounds the possi- 
ble change in the query answer over any two neighboring 
databases. For a single linear query, the sensitivity bounds 
the absolute difference of the query answers. For a query 
matrix, which returns a vector of answers, the sensitivity 
bounds the Li distance between the answer vectors resulting 
from any two neighboring databases. The following proposi- 
tion extends the standard notion of query sensitivity to query 
matrices. Note that because two neighboring databases I 
and 7' differ in exactly one tuple, it follows that their corre- 
sponding vectors x and x' differ in exactly one component, 
by exactly one. 

Proposition! (Query matrix sensitivity). The sen- 
sitivity of matrix Q, denoted Aq, is; 



Aq =''-''^ max IIQx - Qx'lL = maxY^ 

||x-x'|l-,=l " -^111 J ■H' 



Thus the sensitivity of a query matrix is the maximum Li 
norm of a column. 

Example 2. The sensitivities of the query matrices in Fig- 
ure 1 are: Ai^ = 1 and Ah4 = AY4 = 3. A change by one 
in any component Xj will change the query answer I4X by ex- 
actly one, but will change H4X and Y4X by three since each 
Xi contributes to three linear queries in both H4 and Y4 . 

The following proposition describes an e-differentially pri- 
vate algorithm, adapted from Dwork et al. [5]. for releasing 
noisy answers to the workload of queries in matrix W. The 
algorithm adds independent random samples from a scaled 
Laplace distribution. 



Proposition 2 (Laplace mechanism). Let W be a qu- 
ery matrix consisting of m queries, and let h be a length- 
m column vector consisting of independent samples from a 
Laplace distribution with scale 1. Then the randomized al- 
gorithm jC that outputs the following vector is e-differentially 
te: 



£(W,x) 



:Wx+(^)b. 



Recall that Wx is a leiigth-m column vector representing 
the true answer to each linear query in W. The algorithm 
adds independent random noise, scaled by e and the sensi- 
tivity of W. Thus jC(W, x), which we call the output vector, 
is a length-m column vector containing a noisy answer for 
each linear query in W. 

3. THE MATRIX MECHANISM 

Central to our approach is the distinction between a query 
strategy and a query workload. Both are sets of linear queries 
represented as matrices. The workload queries are those 
queries for which the analyst requires answers. Submitting 
the workload queries to the Laplace mechanism described 
above is the standard approach, but may lead to greater er- 
ror than necessary in query estimates. Instead we submit 
a different set of queries to the differentially private server, 
called the query strategy. We then use the estimates to the 
strategy queries to derive estimates to the workload queries. 
Because there may be more than one derived estimate for a 
workload query, we wish to find a single consistent estimate 
with least error. 

In this section we present the formal basis for this deriva- 
tion process. We define the set of queries whose estimates 
can be derived and we provide optimal mechanisms for de- 
riving estimates. Using this derivation, we define the matrix 
mechanism, an extension of the Laplace mechanism that uses 
a query strategy A to answer a workload W of queries. The 
remainder of the paper will then investigate, given W, how 
to choose the strategy A to instantiate the mechanism. 

3.1 Deriving new query answers 

Suppose we use the Laplace mechanism to get noisy an- 
swers to a query strategy A. Then there is sufficient evi- 
dence, in the noisy answers to A, to construct an estimate 
for a workload query w if w can be expressed as a linear 
combination of the strategy queries: 

Definition 3.1 (Queries supported by a strategy). 
A strategy A supports a query w i/ w can be expressed as a 
linear combination of the rows of A. 

In other words, A supports any query w that is in the sub- 
space defined by the rows of A. If a strategy matrix consists 
of at least n linearly independent row vectors (i.e., its row 

space has dimension n), then it follows immediately that it 
supports all linear queries. Such matrices are said to have 
fvll rank. We restrict our attention to full rank strategies 
and defer discussion of this choice to the end of the section. 

To derive new query answers from the answers to A we 
first compute an estimate, denoted xa, of the true counts x. 
Then the derived estimate for an arbitrary linear query w is 
simply the vector product wxa. The estimate of the true 
counts is computed as follows: 



Definition 3.2 (Estimate of x using A). Let A be a 
full rank query strategy A consisting of m queries, and let 
y = jC{A, x) be the noisy answers to A. Then xa is the 
estimate for x defined as: 

XA = A+y, 

where A"*" = (A* A) ^A' is the pseudo-inverse of A. 

Because A has full rank, the number of queries in A, m, 
must be at least n. When m = n, then A is invertible and 

A"^ = A^"'. Otherwise, when m > n, A is not invertible, but 
A+ acts as a left-inverse for A because A^ A = I. We explain 
next the justification for the estimate xa above, and provide 
examples, considering separately the case where m = n and 
the case where m > n. 

A is square. In this case A is an n x n matrix of rank n, 
and it is therefore invertible. Then given the output vector 
y, it is always possible to compute a unique estimate for the 
true counts by inverting A. The expression in Definition 3.2 
then simplifies to : 

XA = A y. 

In this case, query strategy A can be viewed as a linear 
transformation of the true counts, to which noise is added by 
the privacy mechanism. The transformation is then reversed, 
by the inverse of A, to produce a consistent estimate of the 
true counts. 

Example 3. In Figure 1, I4 and Y4 are both square, full 
rank matrices which we will use as example query strategies. 
The inverse of 1a is just I4 itself, reflecting the fact that since 
I4 asks for individual counts of x, the estimate x is just the 
output vector y. The inverse 0/Y4 is shown in Figure 2(c). 
Row i contains the coefficients used to construct an estimate 
of count Xi . For example, the first component of xa will be 
computed as the following weighted sum: .25yi + .25y2 + .5y3. 

Specific transformations of this kind have been studied be- 
fore. A Fourier transformation is used in [1] , however, rather 
than recover the entire set of counts, the emphasis is on a 
set of marginals. A transformation using the Haar wavelet is 
considered [17]. Our insight is that any full rank matrix is a 
viable strategy, and our goal is to understand the properties 
of matrices that make them good strategies. In Section 6 we 
analyze the wavelet technique [17] in detail. 

A is rectangular. When m > n, we cannot invert A and 
we must employ a different technique for deriving estimates 
for the counts in x. In this case, the matrix A contains n 
linearly independent rows, but has additional row queries as 
well. These are additional noisy observations that should be 
integrated into our estimate xa- Viewed another way, we 
have a system of equations given by y = Ax, with more 
equations (m) than the number of unknowns in x (n). The 
system of equations is likely to be inconsistent due to the 
addition of random noise. 

We adapt techniques of linear regression, computing an es- 
timate Xa that minimizes the sum of the squared deviations 
from the output vector. Because we assume A has full rank, 
this estimate, called the least squares solution, is unique. The 
expression in Definition 3.2 computes the well-known least 
squares solution as x = (A* A) ^A*y. 



This least squares approach was originally proposed in [11] 
as a method for avoiding inconsistent answers in differentially 
private outputs, and it was shown to improve the accuracy 
of a set of histogram queries. In that work, a specific query 
strategy is considered (related to our example H4) consisting 
of a hierarchical set of queries. An efficient algorithm is 
proposed for computing the least squares solution in this 
special case. We analyze this strategy further in Sec. 6. 

Example 4. H4, shown in Figure 1, is a rectangular full 
rank matrix with m = 7. The output vector y = /)(H4,x) 
does not necessarily imply a unique esttrnate. For example, 
each of the following are possible estimates of xi: 2/4, 2/2 — 
J/B, yi — yz — J/6, each likely to result in different answers. 
The reconstruction matrix for H4, HJ shown in Fig 2(b), 
describes the unique least squares solution. The estimate for 
xi is a weighted combination of values in the output vector: 
+ ^2/2 - + H J/4 - ^J/5 - ^J/6 - ^J/7- Notice that 
greatest weight is given to y4, which is the noisy answer to 
the query that asks directly for xi; but the other output values 
contribute to the final estimate. 

In summary, whether m = n or m > n, Definition 3.2 
shows how to derive a unique, consistent estimate xa for the 
true counts x. Once xa is computed, the estimate for any 
w is computed as wxa. The following theorem shows that 
Xa is an unbiased estimate of x and that in a certain sense 
it is the best possible estimate given the answers to strategy 
query A. 

Theorem 1 (Minimal Variance of estimate of x). 
Given noisy output y = £(A, x), the estimate x = A+y is 
unbiased (i.e., E[xa] = xj, and has the minimum variance 
among all unbiased estimates that are linear in y. 

The theorem follows from an application of the Gauss- 
Markov theorem [16] and extends a similar result from [11]. 

3.2 The IMatrix IMechanism 

In the presentation above, we used the Laplace mecha- 
nism to get noisy answers y to the queries in A, and then 
derived xa, from which any workload query could then be 
estimated. It is convenient to view this technique as a new 
differentially private mechanism which produces noisy an- 
swers to workload W directly. This mechanism is denoted 
Ma when instantiated with strategy matrix A. 

Proposition 3 (Matrix Mechanism). Let A be a full 
rank rn x n strategy matrix, let W be any p X n workload 
matrix, and let h be a length-rn column vector consisting of 
independent samples from a Laplace distribution with scale 
1. Then the randomized algorithm Ma. that outputs the fol- 
lowing vector is e- differentially private: 

>tA(W,x) = Wx-F (^)WA+b. 
Proof. The expression above can be rewritten as follows: 

>(a(W, x) = W(x -I- (^)A+b) 

= WA+(Ax-F(^)b) 
= WA+£(A,x). 

Thus, A1a(W,x) is simply a post-processing of the output 
of the e-differentially private C and therefore M is also e- 
differentially private. □ 
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Figure 2: For strategy A equal to I4, H4 and Y4, respectively, the matrices above are used to derive the 
estimate xa from the noisy output y — £{A,x.). Row i in each matrix contains the coefficients of the linear 
combination of y used to construct an estimate for count Xi. The inverse of the identity I4 is the identity; 
the reconstruction matrix for H4 is = (H4H4) ^H^; Y4~^ describes the wavelet reconstruction coefficients. 



Like the Laplace mechanism, the matrix mechanism com- 
putes the true answer, Wx, and adds to it a noise vector. 
But in the matrix mechanism the independent Laplace noise 
b is transformed by the matrix and then scaled by 

Aa/e. The potential power of the mechanism arises from 
precisely these two features. First, the scaling is propor- 
tional to the sensitivity of A instead of the sensitivity of W, 
and the former may be lower for carefully chosen A. Second, 
because the noise vector b consists of independent samples, 
the Laplace mechanism adds independent noise to each query 
answer. However, in the matrix mechanism, the noise vec- 
tor b is transformed by WA+. The resulting noise vector 
is a linear combination of the independent samples from b, 
and thus it is possible to add correlated noise to query an- 
swers, which can result in more accurate answers for query 
workloads whose queries are correlated. 

3.3 Rank deficient strategies and workloads 

If a workload is not full rank, then it follows from Defini- 
tion 3.1 that a full rank strategy is not needed. Instead, any 
strategy whose rowspace spans the workload queries will suf- 
fice. However, if we wish to consider a rank deficient strategy 
B, we can always transform B into a full rank strategy A by 
adding a scaled identity matrix SI, where S approaches zero. 
The result is a full rank matrix A which supports all queries, 
but for which the error for all queries not supported by B 
will be extremely high. Another alternative is to apply di- 
mension reduction techniques, such as principle components 
analysis, to the workload queries to derive a transformation 
of the domain in which the workload has full rank. Then the 
choice of full rank strategy matrix can be carried out in the 
reduced domain. 

4. THE ANALYSIS OF ERROR 

In this section we analyze the error of the matrix mech- 
anism formally, providing closed-form expressions for the 

mean squared error of query estimates. We then use ma- 
trix decomposition to reveal the properties of the strategy 
that determine error. This analysis is the foundation for the 
optimization problems we address in the following section. 

4.1 The error of query estimates 

While a full rank query strategy A can be used to com- 
pute an estimate for any linear query w, the accuracy of the 
estimate varies based on the relationship between w and A. 
We measure the error of strategy A on query w using mean 
squared error. 

Definition 4.1 (Query and Workload Error). Let 
Xa be the estimate for x derived from query strategy A. The 



mean squared error of the estimate for w using strategy A 
is: 

ErroRa(w) = E[(wx - wxa)^]. 

Given a workload W, the total mean squared error of an- 
swering W using strategy A is: 

TotalErrora(W) = ^2 ERRORA(wi). 

For any query strategy A the following proposition de- 
scribes how to compute the error for any linear query w and 
the total error for any workload W: 

Proposition 4 (Error Under Strategy A). For a full 

rank query matrix A and linear query w, the estimate o/w is 
unbiased (i.e. E[wxa] = wxj, and the error of the estimate 
of w using A is equal to: 

Errora(w) = (|) Ai w(A'A)-'w*. (1) 

The total error of the estimates of workload W using A is.- 

TotalErrora(W) = (4) Ai. trace( (A* A) "^W*W). (2) 

Proof. It is unbiased because xa is unbiased. Thus, for 
formula (1), the mean squared error is equal to the variance: 

ErroRa(w) = Var{-wx.A.) = V^ar(wx -|- (^^)wA+b) 

= (^)Var(wA+b). 

With algebraic manipulation and that Var{h) = 21m, we 
get: 

yor(wA+b) = wA+yar(b)(wA+)* 
= wA+2I„(wA+)* 
= 2w(A'A)"'A*A((A*A)-')*w* 
= 2w(A'A)"V, 

where ((A'A)^^)* — (A'A)^^ because the matrix is sym- 
metric. Therefore, Errora(w) = (^)^2w(A'A)"V. 

For formula (2) if Wj is row i of workload W, then ERRORA(wi) 
is the i-th entry on the diagonal of matrix (^) AaW(A*A)~^W*. 

Therefore, since the trace of a matrix is the sum of the values 
on its diagonal, the TotalError a ( W) is equal to 

(4)Aitracc(W(A'A)"'w*). 

Formula 2 follows from a standard property of the trace: 
trace(W(A*A)"^W*) = trace((A*A)"^W*W). □ 



These formulas underlie much of the remaining discussion 
in the paper. Formula (1) shows that, for a fixed e, error is 
determined by two properties of the strategy: (i) its squared 
sensitivity, A^.; and (ii) the term w(A*A) '^w*. In the se- 
quel, we refer to the former as simply the sensitivity term. 
We refer to the latter term as the profile term and we call 
matrix (A' A) ^ the error profile of query strategy A. 

Definition 4.2 (Error profile). For any full rank mx 
n query matrix A, the error profile of A, denoted M, is de- 
fined to be the n x n matrix (A* A) 

The coefficients of the error profile M = (A' A) ^ measure 
the covariance of terms in the estimate xa. Element ma 
measures the variance of the estimate of Xi in xa (and is 
always positive), while rriij measures the covariance of the 
estimates of Xi and Xj (and may be negative). We can 
equivalently express the profile term as: 

w(A* A) ^ w* = ^ wlmu + ^ 2wiWjmij , 

i i<j 

which shows that error is a weighted sum of the (positive) 
diagonal variance terms of M, plus a (possibly negative) lin- 
ear combination of off-diagonal covariance terms. This illus- 
trates that it is possible to have a strategy that has relatively 
high error on individual counts yet is quite accurate for other 
queries that are linear combinations of the individual counts. 
We analyze instances of such strategies in Sec. 6. 

Example 5. Figure 3 shows the error profiles for each 
sample strategy. I4 has the lowest error for queries that ask 
for a single count of -x., such as mv = [1,0,0,0]. For such 
queries the error is determined by the diagonal of the error 
profile (subject to scaling by the sensitivity term). Queries 
that involve more than one count will sum terms off the main 
diagonal and these terms can be negative for the profiles of 
H4 and Y4. Despite the higher sensitivity of these two strate- 
gies, the overall error for queries that involve many counts, 
such as w = [1,1,1,1], approaches that 0/ I4. The small 
dimension of these examples hides the extremely poor perfor- 
mance of In on queries that involve many counts. 

The next example uses Prop. 4 to gain insight into the 
behavior of some natural strategies for answering a workload 
of queries. 

Example 6. // A is the identity matrix, then Prop. 4 
implies that the total error will depend only on the workload, 
since the sensitivity of In is 1: 

2 

TotalErrori„ (W) = (^) froce(W*W). 

Here the trace of "W^W is the sum of squared coefficients of 
each query. This will tend to be a good strategy for workloads 
that sum relatively few counts. Assuming the workload is 
full rank, we can use the workload itself as a strategy, i.e. 
A = W. Then Prop. 4 implies that the total error is: 

2 

TotalErrorw(W) = (^) Aw n. 

smce trace((W*W)"^W*W) = trace{In) = n. In this case 
the trace term is low, but the strategy will perform badly if 
Aw is high. Note that if W is m x n, the total error of 



the Laplace mechanism for W will be {{-^)A^m), which is 
worse than the matrix mechanism whenever m > n. 

In some sense, good strategies fall between the two extremes 
above: they should have sensitivity less than the workload but 
a trace term better than the identity strategy. 

4.2 Error profile decomposition 

Because an error profile matrix M is equal to (A* A) ^ for 
some A, it has a number of special properties. M is always a 
square (n x n) matrix, it is symmetric, and even further, it is 
always a positive definite matrix. Positive definite matrices 
M are such that wMw* > for all non-zero vectors w. In 
our setting this means that the profile term is always pos- 
itive, as expected. Furthermore, the function / = wMw* 
is a quadratic function if w is viewed as a vector of vari- 
ables. Then the function / is an elliptic paraboloid over n- 
dimensional space. If we consider the equation wMw* = 1, 
this defines an ellipsoid centered at the origin (the solutions 
to this equation are the queries whose profile term is one). 
We can think of the error function of strategy A as a scaled 
version of this paraboloid, where the scale factor is (^) Aa.. 

To gain a better understanding of the error profile, we con- 
sider its decomposition. Recall that a matrix is orthogonal 
if its transpose is its inverse. 

Definition 4.3 (Decomposition of Profile). LetM 

be any n x n positive definite matrix. The spectral decompo- 
sition of 'M. is a factorization of the form M = PmDmPm; 
where Dm *s an n X n diagonal matrix containing the eigen- 
values of M, and Pm is an orthogonal nxn matrix contain- 
ing the eigenvectors o/M. 

Thus the matrices Dm and Pm fully describe the error 
profile. They also have an informative geometric interpreter- 
tion. The entries of the diagonal matrix Dm describe the 
relative stretch of the axes of the ellipsoid. The matrix Pm 
is orthogonal, representing a rotation of the ellipsoid. In 
high dimensions, a set of common eigenvalues mean that 
the ellipsoid is spherical with respect to the corresponding 
eigen-space. For example, the profile of I4 is fully spherical 
(all eigenvalues are one), but by choosing unequal eigenval- 
ues and a favorable rotation, the error is reduced for certain 
queries. 

4.3 Strategy matrix decomposition 

Despite the above insights into tuning the error profile, 
the matrix mechanism requires the choice of a strategy ma^ 

trix, not simply a profile. Next we focus on the relationship 
between strategies and their profile matrices. 

We will soon see that more than one strategy can result 
in a given error profile. Accordingly, we define the following 
equivalence on query strategies: 

Definition 4.4 (Profile Equivalence). Two query ma- 
trices A and B are profile equivalent if their error profiles 
match. I.e. (A*A)"^ = (B*B)"\ 

A key point is that two profile equivalent strategies may 
have different sensitivity. If A and B are profile equivalent, 
but A has lower sensitivity, then strategy A dominates strat- 
egy B: the estimate for any query will have lower error using 
strategy A. 

Example 7. Recall that strategies Y4 and H4 both have 
sensitivity 3. This is not the minimal sensitivity for strategies 



10 
10 
10 
1 

(a) (I4*l4)"' 



21 



13 -8 -1 -1 

-8 13 -1 -1 

-1 -1 13 -8 

-1 -1 -8 13 

(b) (H4*H4)-' 



3-100 
-13 
3 -1 
0-13 

(c) (Y4*Y4)-^ 



Figure 3: The profile term of the error function for query w on strategy A is w(A'A) ^w*. Shown are the error 
profiles for query strategies I4, H4, and Y4. Every error profile matrix is symmetric and positive definite. 
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(a) H' is profile equivalent to H4, but 
Ah' = 2.896 while AH4 = 3 

"1.73 0.58 0.00 0.00" 

, _ 0.00 1.63 0.00 0.00 

~ 0.00 0.00 1.73 0.58 

0.00 0.00 0.00 1.63 

(b) Y' is profile equivalent to Y4, 
but Ay' = 2.210 while Ay4 = 3 

Figure 4: When two strategies A and B are profile 
equivalent, the one with lower sensitivity dominates. 

achieving either of these error profiles. A square matrix H', 
profile equivalent to H4, is shown in Figure 4(<i)- This matrix 
has sensitivity Ah' = 2.896. A matrix Y', profile equivalent 
to Y4 is shown in Figure 4(b)- This matrix has sensitivity 
Ay' = 2.210. 

To analyze strategy matrices we again use matrix decom- 
position, however because a strategy matrix may not be sym- 
metric, we use the singular value decomposition. 

Definition 4.5 (Decomposition of Strategy). Let A 
be any m x n query strategy. The singular value decom- 
position (SVD) of A IS a factorization of the form A = 
QaDaPa ■s'^c/i that Qa is amx m orthogonal matrix, Da 
is a m X n diagonal matrix and Pa is a n x n orthogonal 
matrix. When m > n, the diagonal matrix Da consists of 
an n X n diagonal submatrix combined with ot"""")^". 

The following theorem shows how the decompositions of 
the error profile and strategy are related. It explains exactly 

how a strategy matrix determines the parameters for the er- 
ror profile, and it fully defines the set of all profile equivalent 
strategies. 

Theorem 2. Let m > n and let M be any n x n posi- 
tive definite matrix decomposed as M = PmDmPm where 
Dm = diag{\i . . . X„). Then for any m x n matrix A, the 
following are equivalent: 

(i) A achieves the profile M, that is (A' A) ^ = M; 

(a) There is a decompostion of A into A = QaDaPa 
where Qa is anmxm orthogonal matrix, Da is anmx 
n matrix equal to diag{l / ^/ti . . . l/\f\n) plus o('"-")><"-, 
and Pa = Pm- 



Proof. Given (i), let D' = diag(\/Xi . . . VX^), and since 
Pm is an orthogonal matrix Pm — Pm 

-\ Then 

D'*P*mA*APmD' = D'*P*mM-^PmD' 

= D'*P*mPmDm"'PmPmD' = I„. 

Thus A = QaD'~^Pm where the Qa is an m x n matrix 
whose column vectors are unit length and orthogonal to each 
other. Let Qa be an m x m orthogonal matrix whose first n 
columns are Qa- Let Da be an rn x n matrix equal to D'~^ 
plus o''""")''", which is equivalent to diag{l/^/)v[ . . . 
plus o^""-")^". Then 

QaDaP*m = Q'aD'"'pLi = A. 

Given (ii) we have A = QaDaPm ^^id wc first compute 
A*A = (QaDaP*m)*(QaDaPLi) = (PivrD^Qi) (QaDaP^i) 
= (PmDaDaPm). Note that while Da may be m x n, 
DaDa is an nxn diagonal matrix equal to diag{l/Xi . . . 1/A„). 

Then (A* A)-' = (PmD^DaPLi)"' = (Pm(D^vDa)"'pLi). 
Then since (DaDa) = diag{\\ . . . A„) = Dm we conclude 
that (A*A)"^ = PmDmPm- □ 

This theorem has a number of implications that inform 
our optimization problem. First, it shows that given any 
error profile M, we can construct a strategy that achieves 
the profile. We do so by decomposing M and constructing a 
strategy A from its eigenvectors (which are contained in Pm 
and inherited by Pa) and the diagonal matrix consisting of 
the inverse square root of its eigenvalues (this is Da, with 
no zeroes added). We can simply choose Q as the nxn 
identity matrix, and then matrix DaPa is an n x n strategy 
achieving M. 

Second, the theorem shows that there are many such strate- 
gies achieving M, and that all of them can be constructed 
in a similar way. There is a wrinkle here only because some 
of these strategies may have more than n rows. That case is 
covered by the definition of Da, which allows one or more 
rows of zeroes to be added to the diagonal matrix derived 
from the eigenvalues of M. Adding zeroes. Da becomes 
mxn, wo choose any m x m orthogonal matrix Qa, and we 
have an rn X n strategy achieving M. 

Third, the theorem reveals that the key parameters of the 
error profile corresponding to a strategy A are determined 
by the Da and Pa matrices of the strategy's decomposition. 
For a fixed profile, the Qa of the strategy has no impact on 
the profile, but does alter the sensitivity of the strategy. Ulti- 
mately this means that choosing an optimal strategy matrix 
requires determining a profile (Da and Pa), and choosing a 
rotation (Qa) that controls sensitivity. The rotation should 
be the one that minimizes sensitivity, otherwise the strategy 
will be dominated. 



Wo cannot find an optimal strategy by optimizing cither of 
these factors independently. Optimizing only for the sensi- 
tivity of the strategy severely limits the error profiles possible 
(in fact, the identity matrix is a full rank strategy with least 
sensitivity). If we optimize only the profile, we may choose 
a profile with a favorable "shape" but this could result in a 
prohibitively high least sensitivity. Therefore we must opti- 
mize jointly for the both the profile and the sensitivity and 
we address this challenge next. 

5. OPTIMIZATION 

In this section, we provide techniques for determining or 
approximating optimal query strategies for the matrix mech- 
anism, and we also give some heuristic strategies that may 
improve existing strategies. We first state our main problem. 

Problem 1 (minError) . Given a workload matrix W, 
find the strategy A that minimizes TotalErroRa(W). 

The MinError problem is difficult for two reasons. First, 
the sensitivity Aa is the maximum function applied to the 
Li norms of column vectors, which is not differentiable. Sec- 
ond, we do not believe MinError is expressible as a convex 
optimization problem since the set of all query strategies that 
support a given workload W is not convex: if A supports 
W, then —A also supports W but |(A -I- (—A)) — does 
not support W. 

In Section 5.1 we show that MinError can be expressed 
as a semidefinite program with rank constraints. While rank 
constraints make the semidefinite program non-convex, there 
are algorithms that can solve such problems by iteratively 
solving a pair of related semidefinite programs. 

Though the set of all query strategics A that support a 
given workload W is not convex, the set of all possible ma- 
trices A*A is convex. In Sec. 5.2 we provide two approaches 
for finding approximate solutions based on bounding Aa by 
a function of A* A rather than a function of A. While each 
technique results in a strategy, they essentially select a pro- 
file and a default rotation Q. Error bounds arc derived by 
reasoning about the default rotation. It follows that both 
of these approximations can be improved considering rota- 
tions Q that reduce sensitivity. Therefore we also consider a 
secondary optimization problem. 

Problem 2 (minSensitivity). Given a query matrix A, 
find the query matrix B that is profile equivalent to A and 

has mimmurn sensitivtty. 

Unfortunately this subproblem is still not a convex prob- 
lem, since the set of all query matrices that are profile equiv- 
alent is also not convex. Notice A is profile equivalent to — A 
but is not profile equivalent to |(A -|- (—A)) = 0. Again the 
problem can be expressed as an SDP with rank constraints 
as it is shown in Sec. 5.4. 

5.1 Solution to the MinError Problem 

In this section we formulate the MinError problem for an 
n X n workload W. It is sufficient to focus on n x n workloads 
because Proposition 4 shows that the strategy that minimizes 
the total error for some workload W also minimizes the total 
error for any workload V such that V*V = W*W. There- 
fore, if given an m x n workload for m > n, we can use spec- 
tral decomposition to transform it into an equivalent n x n 
workload. 



Program 5.1 Minimizing the Total Error 

Given: W e IR"><" 
Minimize: ui + U2 + ■ ■ ■ + 
Subject to: For i e [n] : is the n dimensional column vector whose 
i*'' entry is 1 and remaining entries are 0. 
2I„ -AW-l 

-(AW-i)* (W')"^ZW-i e. 







For i e [n],j e [m] : 



Cji ^ djij Cji ^ flji, ^ ] '^'■ki ^ 1 
k = l 



rank 



Im A 

A' Z 



to (3) 

(4) 
(5) 



Theorem 3. Given annxn workload W, Program 5.1 is 
a semidefinite program with rank constraint whose solution 
is the tuple (A, C, u, Z) and the m x n strategy A minimizes 
TotalErroRa(W) among allmx n strategies. 

Proof. In Program 5.1, ui + . . . + Un is an upper bound 
on the total error (modulo constant factors). The rank con- 
straint in Eq. (5) makes sure that Z — A*A. 

The semidefinite constraint, Eq. (3), ensures that m is 
an upper bound on twice the error of the i"* query in the 
workload, ignoring for the moment the sensitivity term. 

Ui > 2(W(A'A)''W*)« 

To show this, let X be the {m + n) x {m + n) upper left 
submatrix of the matrix in Eq. (3), substituting A* A for Z: 



X: 
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-(AW-1)* 



-AW-' 

(w*)"'a*aw- 



and then 



x-^ = 



W(2I„-AA+) ^W* (WA+ 



WA+ 



2W(A'A) W 



The semidefinite constraints in Eq. (3) are equivalent to: 

Vi, Ui > {X-^)m+i,m+i = 2(W(A'A)"'W*)«. 

Thus, minimizing ui + . . .+Un is equivalent to minimizing 
the trace of W(A*A)~^W*. To make wi -|- . . . -|- m„ a bound 
on the total error, wc must show that Aa = 1- The con- 
straints in Eq. (4) ensure that Aa < 1. To see that Aa > 1, 
observe that (A:X)~^ = ^X~^. So tii -|- . . . m„ is minimized 
when Aa = 1 because otherwise we can multiply X (which 
contains A) by a constant to make ui -|- . . . -|- Wn smaller. 
Above all, we have 

n 

ui+U2 + ...+u„ = 2 ^(W(A*A)"'W*)« 

i = l 

= 2trace(W(A*A)"^W*) 
= 2Ai trace(W(A*A)"^W*) 
= €^TotalErrora(W). 

with € fixed. □ 



Thus Theorem 3 provides the best strategy to the MinError 
problem with at most m queries. Observe that if the optimal 
strategy has m' < m queries, then Program 5.1 will return 
an m X n matrix with m — m' rows of Os. In addition, if the 
workload contains queries with coefficients in {—1,0, 1}, we 
can show that is upper bound on the number of queries 
in the optimal strategy. 

Dattorro [4] shows that solving a scmidcfiriite program 
with rank constraints can be converted into solving two semi- 
definite programs iteratively. The convergence follows the 
widely used trace heuristic for rank minimization. We are 
not aware of results that quantify the number of iterations 
that are required for convergence. However, notice it takes 
0{n^) time to solve a semidefinite program with an n x n 
semidefinite constraint matrix and in Program 5.1, there are 
n semidefinite constraint matrices with size m + n, which can 
be represented as a semidefinite constraint matrix with size 
n(m + n). Thus, the complexity of solving our semidefinite 
program with rank constraints is at least 0{rn'^n'^). 

5.2 Approximations to the MinError problem 

As mentioned above, the MinError problem can be sim- 
plified by bounding the sensitivity of A with some properties 
of A* A. Here we introduce two approximation methods that 

use this idea and can be computed efhciently: the L2 approx- 
imation (5.2.1), and the singular value bound approximation 
(5.2.2). Error bounds on both methods can be measured by 
providing upper bounds to Aa- 

5.2.7 L2 approximation 

Note that the diagonal entries of A*A are the squared L2 
norms of column vectors of A. For sensitivity, recall that 
we are interested in the maximum Li norm of the column 
vectors of A. This observation leads to the following ap- 
proaches: we can either use the L2 norm as an upper bound 
to the Li norm, or we can relax the definition of differential 
privacy by measuring the sensitivity in terms of L2 rather 
than Li. 

Using L2 norm eis an upper bound to Li norm. Instead 
of MinError, we can solve the following L2 approximation 
problem. We use ||A||2 to denote the maximum L2 norm of 
column vectors of A. 

Problems (L2 approximation). Given a workload ma- 
trix W, find the strategy A that minimizes 

||A||^troce(W(A*A)"'w*). 

According to the basic property of L norms, for any vector 
V of dimension n, ||v||2 < ||v||j < v^||v||2. Therefore we 
can bound the approximation rate of the L2 approximation. 

Theorem 4. Given a workload W, let A be the optimal 

solution to the minError problem and A' be the optimal 
solution to the L2 approximation. Then 

TOTALERRORa' (W) < nTOTALERRORA(W). 

Notice the L2 bound is equal to the Li bound if all queries 
in strategy A are uncorrelated, so that the L2 approxima- 
tion gives the optimal strategy if the optimal strategy only 
contains uncorrelated queries such as I„. 

Relctxing the definition of differential privacy. L2 



Program 5.2 L2 approximation 



Given: W e R"^". 
Minimize: ui + U2 + ■ ■ ■ + Un- 
Subject to: For i G [n] : is the n dimensional column vector whose 



entry s 1 and other entries are 0. 



X ei 

e- Ui 



h 0; 

(W'XW)ii < 1, ie[n]. 



norms can also bo applied by relaxing the definition of e- 
differential privacy into (e, (5)-differential privacy, which is 
defined as following: 

Definition 5.1 ((e, (5)-differential privacy). A ran- 
domized algorithm K, is (e, 5) -differentially private if for any 
instance I, any I' € nbrs{I), and any subset of outputs 
S C Range{K,), the following holds: 

Pr[K:{I) G 5] < exp(e) x Pr[/C(7') G S] + 5 

where the probability is taken over the randomness of the IC. 

(e, 5)-differential privacy can be achieved by answering each 
query in strategy A with i.i.d Gaussian noise: 

Theorem 5. Let W be a query matrix consisting of rn 
queries, and let hg be a length-rn column vector consisting 
of independent samples from a Gaussian distribution with 
scale Af(0,81n(2/<5)). Then for e < 81n(2/5), 6 < 1, the 
randomized algorithm L that outputs the following vector is 
(e, 5) -differentially private: 

£^(W,x) = Wx+(il^^)b, 

Recall the proof of Proposition 4 and apply it to Theo- 
rem 5. Minimizing the total error under (e, (5)-differential 
privacy is equivalent to solving Problem 3. 

A semidefinite program (Program 5.2) can be used to solve 
Problem 3. For a given solution X of Program 5.2, any 
n X n matrix A such that X — A'A is a valid solution to 
Problem 3. Moreover, when S is given, the Gaussian noise 
added in the (e, 5)-differential privacy is G(e^||A|||). Ac- 
cording to the relationship between Li and L2 norm, the 
Laplace noise added in the e-differential privacy is at least 
n(e^||A|||), which indicates relaxing the definition of differ- 
ential privacy will always lead to better utility. 

5.2.2 Singular value bound approximation 

Another way to bound the Li sensitivity is based on its ge- 
ometric properties. Remember the matrix A can be repere- 

sentod by its singular value decomposition A = QaDaPa- 
Let us consider the geometry explanation of the sensitivity. 
The sensitivity of A can bo considered as the radius of min- 
imum L\ ball that can cover all column vectors of A, and 
column vectors of A lay on the ellipsoid 



6a : x*Qk(DkDA) Qax = 1. 



Let 



denotes radius of the minimum Li ball that covers 
the ellipsoid </>a. Notice all the column vectors of A are 
contained in ^a, which indicates A a < ^tt>A,- The minimum 



sensitivity that can bo achieved by the strategies that are 
profile equivalent to A can be bounded as following: 

min Ab < niin A^„. 

B:B*B=A*A B:B*B=A*A 

The matrix B that is profile equivalent to A and has the 
minimum A^^ is given by the theorem below. 

Theorem 6. Let A be a matrix with singular value de- 
composition A = QaDaPa o.'nd 61,62, ■■■ ,Sn be its singular 
values. Then 

argmin A^g = DaPa, 

B:B*B=A*A 



mm 

B : BtB^ 



A^3 = Jsf + 6^ + ...+6l < V^Aa. (6) 

A*A * 



Using the singular value bound in Theorem 6 to substi- 
tute for the Li sensitivity, the minError problem can be 
converted to the following approximation problem. 

Problem 4 (Singular value bound approximation) 
Given a workload matrix W, find the strategy A that mini- 



{5l + 5l + ... + 5i)trace{W{A'A) 'w*), 

where 61,62, ... ,5n are singular values of A. 

The singular value bound approximation has a closed-form 
solution. 

Theorem 7. Let W be the workload matrix with singular 
value decomposition W = QwDwPw o-i^d' 61,62, ■■■ ,6'„ be 
its singular values. The optimal solution Da, Pa to the 
singular value bound approximation is to let Pa = Pw and 



Da = diag{^^/5{, \/6i^, . . 

The solution in Theorem 7 is very similar to the strategy 
rnetioned at the end of Sec. 4 that matches Pa to Pw and 
Da be diag{6'i,6'2, . . . , 6'^. We use a slightly different Da so 
as to provide an guaranteed error bound based on Theorem 6. 

Theorem 8. Given a workload W, let A be the optimal 

solution to the minError problem and A' be the optimal 
solution to the singular value bound approximation. Then 

TOTALERRORa' (W) < nTOTALERRORA(W). 

5.3 Augmentation Heuristic 

We formalize below the following intuition: as far as the 
error profile is concerned, additional noisy query answers can 
never detract from query accuracy as they must have some 
information content useful to one or more queries. Therefore 
the error profile can never be worse after augmenting the 
query strategy by adding rows. 

Theorem 9. (Augmenting a strategy) Let Abe a query 
strategy with full rank and consider a new strategy A' ob- 
tained from A by adding the additional rows of strategy B, 
so that A' = [ ^ ] . For any query w, we have: 

w*(A'*A')"'w < w*(A*A)- V 

Further, w'(A'*A')""'w = w*(A*A)~'^w only for the queries 
in the set {A* Aw | Bw = 0}, which is non-empty if and only 
if B does not have full column rank. 



Program 5.3 Minimizing the sensitivity 

Given: M e R"^". 
Minimize: r. 
Subject to: For i £ [n],j £ [m] : 



rank I 



In A 

A* M-i 



The proof is included in Appendix D. 

This improvement in the error profile may have a cost — 
namely, augmenting A with strategy B may lead to a strat- 
egy A' with greater sensitivity than A. A heuristic that fol- 
lows from Theorem 9 is to augment strategy A only by com- 
pleting deficient columns, that is, by adding rows with non- 
zero entries only in columns whose absolute column sums 
are less the sensitivity of A. lu tliis case the augmentation 
does not incrcEise sensitivity and is guaranteed to strictly im- 
prove accuracy for any query with a non-zero coefficient in 
an augmented column. 

Our techniques could also be used to reason formally about 
augmentations that do incur a sensitivity cost. Wo leave this 
as future work, as it is relevant primarily to an interactive 
differentially private mechanism which is not our focus here. 

5.4 Minimizing the sensitivity 

We now return to Problem 2 which finds the strategy with 
least sensitivity that results in a given profile. This problem 
is important whenever one has a specific profile in mind (e.g. 
the profile of strategy H„ or Y„), or when one used another 
method to compute a desired profile (e.g. the approxima^ 
tion method from Section 5.2). Recall that for a fixed error 
profile, the profile-equivalent strategies are determined by 
the choice of a rotation matrix Q which then determines the 
sensitivity of the strategy. The following theorem formulates 
the problem of minimizing the sensitivity into a semideflnite 
program with rank constraint. 

Theorem 10. Given an error profile M, Program 5.3 is 
a semidefinite program with rank constraint that outputs a 
square matrix A such that (A* A) = M and such that the 
sensitivity of A is minimized. 

6. APPLICATIONS 

In this section we use our techniques to analyze and im- 
prove existing approaches. We begin by analyzing two tech- 
ruques proposed recently [17, 11]. Both strategics can be soon 
as instances of the matrix mechanism, each using different 
query strategies designed to support a workload consisting 
of all range queries. Although both techniques can support 
multidimensional range queries, we focus our analysis on one 
dimensional range queries, i.e. interval queries with respect 
to a total order over dom(B). 

We will show that the seemingly distinct approaches have 
remarkably similar behavior: they have low (but not mini- 
mal) sensitivity, and they are highly accurate for range queries 
but much worse for queries that are not ranges. We describe 
these techniques briefly and how they can each be repre- 



sented in matrix form. 

In the hierarchical scheme proposed in [11], the query strat- 
egy can be envisioned as a recursive partitioning of the do- 
main. We consider the simple case of a binary partitioning, 
although higher branching factors were considered in [11]. 
First we ask for the total sum over the whole domain, and 
then ask for the count of each half of the domain, and so 
on, terminating with counts of individual elements of the 
domain. For a domain of size n (assumed for simplicity to 
be a power of 2), this results in a query strategy consisting 
of 2n — 1 rows. We represent this strategy as matrix H„, 
and H4 in Fig. 1 is a small instance of it. 

In the wavelet scheme, proposed in [17], query strategies 
are based on the Haar wavelet. For one dimensional range 
queries, the technique can also be envisioned as a hierarchical 
scheme, asking the total query, then asking for the difference 
between the left half and right half of the domain, continuing 
to recursc, asking for the difference in counts between each 
binary partition of the domain at each step.^ This results in 
n queries — fewer than the hierarchical scheme of [11]. The 
matrix corresponding to this strategy is the matrix of the 
Haar wavelet transform, denoted Y„, and Y4 in Fig. 1 is a 
small instance of it. 

Thus H„ is a rectangular (2n — 1) x n strategy, with answers 
derived using the linear regression technique, and Y„ is an 
nxn strategy with answers derived by inverting the strategy 
matrix. As suggested by the examples in earlier sections, 
these seemingly different techniques have similar behavior. 
We analyze them in detail below, proving new bounds on 
the error for each technique, and proving new results about 
their relationship to one another. We also include In in the 
analysis, which is the strategy represented by the dimension 
n identity matrix, which asks for each individual count. 

6.1 Geometry of i„, Hn and Y„ 

Recall from Section 4 that the decomposition of the error 
profile of a strategy explains its error. The decomposition 
of I„ results in a D that is itself the identity matrix. This 
means the error profile is spherical. To understand the shape 
and rotation of the error profiles for Y„ and H„ we provide a 
complete analysis of the decomposition, but leave the details 
in the Appendix F.l. The eigenvalues and eigenvectors are 
shown in Table 1 of Appendix F.l. Their eigenvalue distri- 
butions are remarkably similar. Each has log n + 1 distinct 
eigenvalues of geometrically increasing frequency. The actual 
eigenvalues of H„ are smaller than those of Y„ by exactly 
one throughout the increasing sequence, except the largest 
eigenvalue: it is equal to the second largest eigenvalue in 
Y„, but it has a distinct value in Hn. Finally, the smallest 
eigenvalue of either approach is 1 and the ratio between their 
corresponding eigenvalues is in the range [|,2]. 

For sensitivity, it is clear that Ai^ = 1 for all n. Intuitively, 
this sensitivity should be minimal since the columns of /„ 
are axis aligned and orthogonal, and any rotation of In can 
only increase the Li ball containing the columns of In. This 
intuition can be formalized by considering the relationship 



^We note that the technique in [17] is presented somewhat 
differently, but that the differences are superficial. The au- 
thors use queries that compute averages rather than sums, 
and their differentially private mechanism adds scaled noise 
at each level in the hierarchy. We prove the equivalence of 
that construction with our formulation Y„ in App. E. 



between the Li norm and the L2 norm stated in Section 
5.2.1. No strategy profile equivalent to !„ can have lower 
sensitivity, since A/^ = ]]/n]]2 = 1. 

On the other hand, the sensitivity of Y„ and Hn is not 
minimal, suggesting that there exist strategies that dominate 
both of them. We have Ay„ ~ Ah„ ~ log2 n + 1. In ad- 
dition we find that their L2 norms arc also equal: ||Yn||2 ~ 
]]Hn||2 = \/iog2 n + 1. This L2 norm is a lower bound on 
the sensitivity of profile equivalent strategies for both Hn and 
Y„. We do not know if there are profile equivalent strategies 
that achieve this sensitivity lower bound for these strategies. 
We can, however, improve on the sensitivity of both. As an 
example, Fig. 4 shows profiles equivalent to H4 and Y4 with 
improved sensitivity. Through our decomposition of Hn and 
Y„ we have derived modest improvements on the sensitivity 
in the case of arbitrary n > 8: logn -|- 0.64 for Hn, which is 
the sensitivity of its decomposition, and log n + 2\/2 — 4 for 
Y„, which is achieved by applying some minor modifications 
to its decomposition. We suspect it is possible to find rotar- 
tions of Hn and Y„ that improve more substantially on the 
sensitivity. 

6.2 Error analysis for i„,H„ and Yn 

In this section we analyze the total and worst case error 
for specific workloads of interest. We focus on two typical 
workloads: Wji, the set of all range queries, and Woi, which 
includes arbitrary predicate queries, since it consists of all 
linear queries 0-1 queries. Note that attempting to use either 
of these workloads as strategies leads to poor results: the 
sensitivity of "Wr is 0{n^) while the sensitivity of Woi is 
0(2"). 

In the original papers describing H„ and Y„ [11, 17], both 
techniques are shown to have worst case error bounded by 
0(log^ n) on Wij. Both papers resort to experimental anal- 
ysis to understand the distribution of error across the class 
of range queries. We note that our results allow error for any 
query to be analyzed analytically. 

It follows from the similarity of eigenvectors and eigenval- 
ues of H„ and Y„ that the error profiles are asymptotically 
equivalent to one another. We thus prove a close equivalence 
between the error of the two techniques: 

Theorem 11. For any linear counting query w, 

^Errory(w) < Errorh(w) < 2Errory(w). 

Note that this equivalence holds for the hierarchical strat- 
egy with a branching factor of two. Higher branching factor 
can lower the error rates of the hierarchical strategy com- 
pared with the wavelet technique. 

Next we summarize the maximum and total error for these 
strategies. The following results tighten known bounds for 
W_R, and show new bounds for Woi. The proof of the fol- 
lowing theorem can be found in Appendix F.2. 

Theorem 12 (Maximum and Total Error). The max- 
imum and total error on workloads Wr and Woi using strate- 
gies H„, Y„, and In is given by: 
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e(log^ n/e") 


e(log^n/e^) 




Woi 


e(nlog2 n/e^) 


e(nlog2 n/e^) 





TotalError 



Wr 

Wol 



e(n^ log^ n/e') e(n"log^ n/e") e(nVe") 
e(n2" log2 n/e^) e(n2" log^ n/e^) e(n2"/e2 



While H„ and Y„ achieve similar asymptotic bounds, their 
error profiles are slightly different (as suggested by previous 
examples for n = 4). As a result, H„ tends to have lower 
error for larger range queries, while Y„ has lower error for 
unit counts and smaller range queries. 

7. RELATED WORK 

Since differential privacy was first introduced [8], it has 
been the subject of considerable research, as outlined in re- 
cent surveys [5, 6, 7]. 

Closest to our work are the two techniques, developed inde- 
pendently, for answering range queries over histograms. Xiao 
et al. [17] propose an approach based on the Haar wavelet; 
Hay et al. [11] propose an approach based ou hierarchical 
sums and least squares. The present work unifies these two 
apparently disparate approaches under a significantly more 
general framework (Section 3) and uses the framework to 
compare the approaches (Section 6). While both approaches 
arc instances of the matrix mechanism, the specific algo- 
rithms given in these papers are more efficient than a generic 
implementation of the matrix mechanism employing matrix 
inversion. Xiao et al. also extend their wavelet approach to 
nominal attributes and multi-dimensional histograms. 

Barak et al. [1] consider a Fourier transformation of the 
data to estimate low-order marginals over a set of attributes. 
The main utility goal of [1] is integral consistency: the num- 
bers in the marginals must be non-negative integers and their 
sums should be consistent across marginals. Their main re- 
sult shows that it is possible to achieve integral consistency 
(via Fourier transforms and linear programming) without 
significant loss in accuracy. We would like to use the frame- 
work of the matrix mechanism to further investigate optimal 
strategies for workloads consisting of low-order marginals. 

Blum et al. [2] propose a mechanism for accurately an- 
swering queries for an arbitrary workload (aka query class), 
where the accuracy depends on the VC-dimension of the 
query class. However, the mechanism is inefficient, requiring 
exponential runtime. They also propose an efficient strategy 
for the class of range queries, but this approach is less ac- 
curate than the wavelet or hierarchical approaches discussed 
here (see Hay et al. [11] for comparison). 

Some very recent works consider improvements on the 
Laplace mechanism for multiple queries. Hardt and Tal- 
war [10] consider a very similar task based on sets of linear 
queries. They propose the fe-norm mechanism, which adds 
noise tailored to the set of linear queries by examining the 
shape to which the linear queries map the Li ball. They also 
show an interesting lower bound on the noise needed for sat- 
isfying differential privacy that matches their upper bound 
up to polylogaritlimic factors assuming the truth of a central 
conjecture in convex geometry. But the proposed fe-norm 
mechanism can be inefficient in practice because of its re- 
quirement of sampling uniformly from high-dimensional con- 
vex bodies. Furthermore, the techniques restrict the number 
of queries to be less than n (the domain size). A notable 
difference in our approach is that our computational cost is 
incurred for finding the query strategy. Once a strategy is 
found, our mechanism is as efficient as the Laplace mecha- 
nism. For stable or recurring workloads, optimization needs 



only to be performed once. 

Roth and Roughgarden [15] consider the interactive set- 
)ting, in which queries arrive over time and must be an- 
swered immediately without knowledge of future queries. 
They propose the median mechanism which improves upon 
the Laplace mechanism by deriving answers to some queries 
from the noisy answers already received from the private 
server. The straightforward implementation of the median 
mechanism is inefficient and requires sampling from a set 
of super-polynomial size, while a more efficient polynomial 
implementation requires weakening the privacy and utility 
guarantees to average-case notions (i.e., guarantees hold for 
most but not all input datasets). 

The goal of optimal experimental design [14] is to produce 
the best estimate of an unknown vector from the results of 
a set of experiments returning noisy observations. Given the 
noisy observations, the estimate is typically the least squares 
solution. The goal is to minimize error by choosing a subset 
of experiments and a frequency for each. A relaxed version 
of the experimental design problem can be formulated as 
a semi-definite program [3]. While this problem setting is 
similar to ours, a difference is that the number and choice 
of experiments is constrained to a fixed set. In addition, 
although experimental design problems can include costs as- 
sociated with individual experiments, modeling the impact 
of the sensitivity of experiments does not fit most problem 
formulations. Lastly, the objective function of most exper- 
imental design problems targets the accuracy of individual 
variables (the x counts), rather than a specified workload 
computed from those counts. 

8. CONCLUSION 

We have described the matrix mechanism, which derives 
answers to a workload of counting queries from the noisy an- 
swers to a different set of strategy queries. By designing the 
strategy queries for the workload, correlated sets of counting 
queries can be answered more accurately. We show that the 
optimal strategy can be computed by iteratively solving a 
pair of semidefinite programs, and we use our framework to 
understand two recent techniques targeting range queries. 

While we have formulated the choice of strategy matrix as 
an optimization problem, we have not yet generated optimal — 
or approximately optimal — solutions for specific workloads. 
Computing such optimal strategies for common workloads 
would have immediate practical impact as it could boost the 
accuracy that is efficiently achievable under differential pri- 
vacy. Wc also plan to apply our approach to interactive query 
answering settings, and we would like to understand the con- 
ditions under which optimal strategies in our framework can 
match known lower bounds for differential privacy. 
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APPENDIX 



A. SCALING QUERY STRATEGIES 

The matrix mechanism always adds identically-distributed noise to each query in the strategy matrix. In addition, in 
Section 5, the sensitivity of all considered strategies is bounded by 1. In this section, we demonstrate that those constraints 
do not limit the power of the mechanism or the generality of the optimization solutions. 

A.l Scalar multiplication of query strategy 

Multiplying a query strategy by a scalar value (which scales every coefficient of every query in the strategy) does not change 
the error of any query. The sensitivity term is scaled, but it is compensated by a scaling of the error profile. 

Proposition 5 (Error equivalence under, scalar multiplication). Given query strategy A, and any real scalar 
fc 7^ 0, the error for any query w is equivalent for strategies A and kA. That is, Vw, Errora(wx) = Error/ca(wx). 

Proof. It is easy to see that AkA = kAA- Thus, for any query w, wc have 

ERRORfcA(wx) = AfeAw((fcA)*(fcA))"^W* 

= fc^Aiw(fc^A*A)"'w* 

= fc^Aiw-';r(A*A)"'w* 

= Aiw(A'A)" V* 
= Errora(wx). 

□ 

A. 2 Strategies with unequal noise - scaling rows 

The strategies described in this paper add equal noise to each query; i.e., the independent Laplace random variables have 
equal scale. It is sufficient to focus on equal-noise strategies because, as the following proposition shows, any unequal-noise 
strategy can be simulated by an equal-noise strategy. 

Proposition 6 (Simulating unequal noise strategies). Let ICa be an unequal-noise strategy that returns y = Ax + 
E where E is a m-length vector of independent Laplace random variables with unequal scale. Let hi denote the scale of the i^^ 
Laplace random variable in the vector. There exists an equal-noise strategy such that its output, y', has the same distribution 
as y. 

Proof. Let R be an m x m diagonal matrix with ru — b/bi for an arbitrary 6. Let the equal noise strategy be defined as 
/Cb = Bx + Lap(b) where the query matrix B — RA. Let y' = R^^(Bx + Lap(b)). 
The claim is that y and y' have the same distribution. Vector y' can be expressed as: 

y' = R-'(Bx + Lap(b)) 

= R"^Bx + R^Lap(b) 

= Ax-|-R"^Lap(b) 

Observe that R~^Lap(b) is an mx 1 vector where the i*'^ entry is equal to ^Lap{b). The quantity ^Lap{b) follows a Laplace 
distribution with scale bi, and is therefore equal in distribution to the i*'' entry of E, which is Lap(bi). Therefore y' is equal 

in distribution to y. □ 

B. RELAXATION OF DIFFERENTIAL PRIVACY 

(e, J)-differential privacy is introduced in Section 5.2.1. Here we formally prove the amount of gaussian noise required to 
achieve (e, 5)-difrerential privacy. 

Theorem 5. Let "W be a query matrix consisting of m queries, and let hg be a length-m column vector consisting of 
independent samples from a Gaussian distribution with scale N{0,8\n{2/6)). Then for e < 81n(2/(5), 5 <1, the randomized 
algorithm C that outputs the following vector is (e, 5) -differentially private: 

A(W,x) = Wx+(^^)bi 

e 

Proof. Let I and /' be neighboring databases, and then their corresponding vectors x and x' differ in exact one component. 
Notice 

max ||Wx-Wx'||2 = ||W||2, 

||x-x'||j=l 

consider adding column vector (||W||2/e)b to Wx. Let 



cT = (||W||2/e)2y21n(2/5). 



For any vector equals to Wx + z, we have 

Pr[Wx + ( IMik )b = Wx + z] 



Pr[Wx' + (IMl )b = Wx + z] 

e 2<r2 <. I 
" g- 5^ ((Wx-Wx'+z)*(Wx-Wx'+z)) 
,g5^(l|Wx-Wx'||i-2x*(Wx-Wx')) 



<g 16 ln(2/5) 

e . z*(W!c-Wx') 



z*(Wx-Wx') 



Let Z = {Wx + z| ^ (Wx^Wx ) ^ |}, to guarantee (e, 5)-diffcrential privacy, we only need to proof 

S > Pr[Wx+ (li^^)b € Z] 

= Pr[b*(Wx- Wx') > 4||W||2ln(2/<5)]. 

Notice the entries of random vector b arc independent variblcs following A'^(0, 8 ln{2/5)) and b*(Wx — Wx') can be considered 
as a weighted sum of all the entries of b, b*(Wx — Wx') follows iV(0, 8||Wx — Wx'jjl \n{2/S)). Let z he a variable that 
follows N{0, 1) Then 

Pr[b*(Wx - Wx') > 4||W||2 ln(2/<5)] 
= Pr[2| I Wx - Wx'l 1 2 v/21n(2/5)« > 4| | W| I2 ln(2/(5)] 

-p-[-^ iiwrwxiu ^^^^] 



<Pr[« > ■v/21n(2/5)]. 



Notice that 



we have 



Pr[2 > xl < - —^e' 



Pr[.>y2h^]<^^=A_<, 
Thus the randomized algorithm C follows (e, 5)-differentially privacy. □ 

C. SINGULAR VALUE BOUND APPROXIMATION 

In this section we theoretically compute the approximation rate of the singular value bound approximation and the optimized 
solution under the singular value bound approximation. 

Lemma 1. Given an ellipsoid defined by x*Zx — 1 and a vector v, v*x = V v*Z~iv is a tangent hyperplane of the ellipsoid. 

Proof. For any point y on the ellipsoid, the tangent hyperplane of the ellipsoid on y is y*Ax = 1. Consider a tangent 
hyperplane of the ellipsoid: v*x = k where k is an unknown constant, there exists a point xo on the ellipsoid such that 
xqA = j^. Therefore xo = Since xqZxo = 1, we know 



1 = x*Zxo = (^v*Z-^)Z( Jz-V) = iv*Z- 



Therefore k = Vv*Z-iv. □ 



Theorem 6. Let A be a matrix with singular value decomposition A = QaDaPa o-nd 5i,52, ■ ■ ■ ,Sn be its singular values. 
Then 

argmin A^^ = DaPa, 

B:B«B=A*A 

« ^^i^'^.t . ^-^B = j5f+d^, + ... + 5l < V^Aj,. (7) 

B:B*B=A*A » 

Proof. For any strategy B, the ellipsoid (/>b must tangent with diamond with radius A^^. With out lose of generality, let 
us assume it is tangent to the hyperplane (1,1,..., l)x = A^j^ and (ai, . . . , a„)x < A^^, here at = 1,-1. Let B = QbDaPa 
be the singular value decomposition of B and let ^ = Qb(DaDa) ^Qb to simplify the notation. According to Lemma 1, 

(1, 1, ... , !)*(!, 1 . . . , 1)* > (ai, . . . , o„)*(oi, . . . , o„)*. 



In particular, 

(1, 1, ..1)*(1, 1 . . . , 1)* > (-1, 1, 1, 1, ... , !)*(-!, 1, 1, 1, ... , 1)*, 

which means ipi2 + tpis + . . . + tpin = i'u — V'n ^ 0- Similarly, we can show for any j we have ipji — ^jj > 0. 

Therefore 

(1,1,. .!)*(!, !...,!)* 

i 3 

3 3 i 

3 

= trace(*) 

= trace(QB(DkDA)"'QB) 
= trace(QBQB(DkDA)"') 
= trace((D5vDA)"') 

Since Da is fixed, the minimize can be achieved in case that $ is a diagonal matrix, which indicates that Qb = I. Therefore, 
B = DAPk 

= = y'(l,l,..l)D*^DA(l,l...,l)' = ^5f + 5^^ + ...+5g, 

where 61,62, ■■■ ,5n are the singular values of A. 

Moreover, notice the fact that the sum of square of L2 norm of all the columns of B is same as the sum of square of L2 
norm of all the rows of B = DaPa- Since is a rotation matrix it does not change the L2 norm of rows in Da, which is 

6f + 62 + ■ ■ ■ + 6n. Notice B has n columns in total, there exists a column of B whose L2 norm is at least yJ~hj^^^2±^^^+K^ 
that < v^||B||2. Since ||B||2 = ||A||2, ||A||2 < Aa, we know A^^ < ^/uAa- □ 

Theorem 7. Let W be the workload matrix with singular value decomposition W = QwDwPw '^^'^ 6[,6'2, ■ ■ ■ ,5'„ be 
its singular values. The optimal solution Da, Pa to the singular value bound approximation is to let Pa = Pw and 
Da = diag{-</5[, ^/¥^, -v/SJ) . 

Proof. Recall the total error with the singular value approximation: 

(6^+5^ + ... + 5^)trace(W(A*A)"' W*), 

where 61,62, ■■■ ,Sn are the singular values of A. Notice 

trace(W(A*A)"^W*) = trace((A*A)"^W*W) 

= trace(Pk(DiDA)"^PAPWDWDwPw) 

= trace(DW (PaPw)* (D^Da)"' (PaPw)DWDw), 

and Pa docs not influence the singular value approximation to the sensitivity. Pa can be arbitrary orthogonal matrix and 
then PaPw can be arbitrary orthogonal matrix as well. Then the best PaPw is set to be the one that minimizes the error 
on estimating Dw with given Da. Since Dw is actually a set of queries over individual buckets, the best strategy to estimate 
it is also queries over individual buckets, which means PaPW = I- Since Pw is an orthogonal matrix. Pa = Pw. Then, the 
total error with singular value approximation is: 

{6t +6I + ... + 5i;)trace((DkDA)"'DWDw) = {Sf + + . . . + Sl){% + ^ + . . . + ^) 

6( 6i 52 

= {5'i + 6'2 + ... + 6'„f 

To achieve the lower bound given by the equality above, it requires that for each i, the ratio between |i and Si to be constant, 

which means 6i = c^/6l for some constant c. Since a constant multiple does not change the strategy, we let c = 1 and then 
have the theorem proved. □ 

D. COMPLETING DEFICIENT COLUMNS 

Here we complete the proof of the theorem about augmenting the deficient columns in Section 5.3. 



Proposition 7. For any square matrix A, if v is an eigenvector of A with eigenvalue \, v is an eigenvector of kl + A 
with eigenvalue fc + A. If A is invertible, v is an eigenvector o/A^^ with eigenvector j. 

Proof. Since (fcl + A)v = fciv + Av = fcv + Av — (k + A)v, we know v is an eigenvector of fcl + A with eigenvalue fc + A. 
If A is invertible, we know A otherwise there exists a non-zero vector v such that Av = 0, which contradicts with the 
fact that A is invertible. Moreover, since Av = Av, A~'^v = ^v, v is an eigenvector of A"'^ with eigenvector j. □ 

Theorem 9. (Augmenting a strategy) Let A be a query strategy with full rank and consider a new strategy A' obtained 
from A by adding the additional rows of strategy B, so that A' = [g]. For any query w, we have: 

w'(A'*A')"'w < w*(A*A)"V 

Further, ■w*(A'*A')~'^w = w*(A*A)~^w only for the queries in the set {A* Aw | Bw = 0}, which is non-empty if and only 
if B does not have full column rank. 

Proof. Since A is a query plan with full column rank, there exists an invertible square matrix Q such that A*A = Q*Q. 

Moreover, notice that A"A — A*A + B*B, the theorem we are going to prove is equivalent to the following statement: the 
matrix (Q'Q)"^ - (Q*Q + B^B) ^ is positive semi-definite. Since Q is an invertible matrix, for any query w, 

w'((Q'Q)^i - (Q*Q + B'B)-i)w 
=((Q*)- - Q(Q'Q + B*B)-^Q*)((Q*)- V). 

Therefore it is enough to show I— Q(Q*Q + B*B)^^Q* is positive semi-definite, which is equivalent to prove that all eigenvalues 
of Q(Q'Q -I- B*B)~^Q* are less than or equal to 1 according to Proposition 7. Furthermore, since Q(Q*Q -I- B*B)"^Q' is 
invertible, according to Proposition 7, the statement that all eigenvalues of Q(Q*Q -|-B'B)~^Q* are less than or equal to 1 is 
equivalent to the statement that all the eigenvalue of its inverse matrix, (Q*)~^(Q*Q -|- B*B)Q~^, is larger than or equal to 
1. Notice (Q*)-i(Q*Q-f B*B)Q-i = I -h (BQ-i)*(BQ-i), we can apply Proposition 7 again and to prove the eigenvalues of 
(BQ^^)*(BQ^^) are non-negative. Since for any vector v, v*(BQ^^)*(BQ^^)v > 0, we know is semi-positive definite, hence 
all its eigenvalues are non-negative. 

Moreover, according to Proposition 7, (Q')^^vir is an eigenvector of I — Q(Q*Q + B*B)^^Q* with eigenvalue if and only 
if it is an ei genvector of Q(Q*Q -|- B'B)"^Q* = (I + (BQ"^)*(BQ"^))"* with eigenvalue 1, which is equivalent with the fact 
that (Q*)~^w is an eigenvector of (BQ~'^)*(BQ~^) with eigenvalue 0. Furthermore, notice the fact that A*A is an invertible 
matrix and (BQ-^)*(BQ-i)(Q*)-iw = is equivalent to (BQ-i)(Q*)-iw = B(Q*Q)-iw = B(A*A)-iw = 0. Therefore 
(BQ~^)*(BQ^^) has eigenvalue if and only if B does not have full column rank. When B does not have full column rank, 
the set wb = {w|Bw = 0} is not empty, and the set of all non-zero queries w such that w*(A'*A')~^w = w*(A'A)^^w can 
be represented as {A*Aw|w € wb} = {A*Aw|w € wB = 0}. □ 

E. REPRESENTING THE HAAR WAVELET TECHNIQUE 

The representation of Haar wavelet queries in Section 6 is different from their original presentation in Xiao et al. [17]. The 
following theorem shows the equivalence of both representations. 

Proposition 8 (Equivalence of Haar wavelet representations). Let itHaar denote the estimate derived from the 
Haar wavelet approach of Xiao et al. [17]. Let xy„ denote the estimate from asking query Wn- Then i-Haar and xy„ are 
equal m distribution, i.e., Pr[x.Haar < x] = Pr[xY„ < x] for any vector x. 

Proof. Given vector x, the Haar wavelet is defined in terms of a binary tree over x such that the leaves of the tree are x. 

Each node in the tree is associated with a coefficient. Coefficient Ci is defined as Ci — (a^ — clii)/2 where (or) is the 
average of the leaves in the left (right) subtree of Ci. Each Ci is associated with a weight W(ci) which is equal to the number 
of leaves in subtree rooted at Ci. (In addition, there is a coefficient co that is the equal to the average of x and W(co) = n). 

An equivalent definition for a is Cj = X)j=i ^j^iU) where for i > 0, 

fl/VV(ci), if J is in the left subtree of Ci 
— 1/W(ci), if j is in the right subtree of c, 
0, otherwise 

For 1 = 0, then Zi{j) is equal to l/>V(co) for all j. 

Let A be a matrix where = Zi{j). The i*^ row of A corresponds to coefficient c,. Since there are n coefficients, A is an 
n X n matrix. 

The approach of [17] computes the following ynarr = Ax-|-E where E is an n x 1 vector such that each E^ is an independent 
sample from a Laplace distribution with scale bi = • Observe that E can be equivalently represented as: 



where R is an n x n diagonal matrix with m = W{ci). The estimate for x is then equal to: 

Xifaar = A" Vwarr = X + A^^E 



= x + (RA)-fi+^')b 



b 

We now describe an equivalent approach based on the matrix Y„. Observe that Y„ = RA. The sensitivity of Y„ is 
Ay„ = 1 + log n. Using the matrix mechanism, the estimate xy„ is: 



= Y„-^ (^Y„x+(^)b; 

1 Ay - 
= x + Y„-i^^b 



= x + (RA) 

□ 



e 

_i / I + logn 



F. ANALYSIS OF THE HIERARCHICAL AND WAVELET STRATEGIES 

In Section 6 we demonstrated the results of applying the matrix mechanism to analyze hierarchical and wavelet scheme. In 
this section, we show the detailed analysis to those results. 

F.l Eigen-decomposition of H„ and Y„ 

The following eigen-decomposition shows the similarity between H„ and Y„ . 

Theorem 13. Let n be a power of 2, so that n = 2*. The eigenvalues and their corrsponding eigienvectors o/H^H„ and 
YJ,Y„ are shown as Table 1. (Qaxb o-nd lax6 are the a x b matrices whose entries are all and 1, respectively). 



eigenvalue of H„ 


eigenvalue of Y„ 


order 


eigenvector 


1 


2 


2k-i 


[1, -1, 0ix(2l'~2)] 
[0,0,l,-l,Oix(2fc-4)] 

[0lx{2'=-2), 1, -1] 


3 


4 


2^-2 


[llx2, — llx2, 0ix(2'=-4)] 
[0lx4, llx2, — llx2, 0ix(2'=-8)] 

[0lx(2'<-4)i llx2, — llx2, ] 










2*= - 1 


2k 


t^m—k 


[Ilx2''-l)~llx2'=-l: 0ix(2''-2'=)] 
[0lx2'«> llx2'=-li ~llx2fe-l>0ix(2'=-2*:+l)] 

[0lx(2'«-2*:)i Ilx2'=-l5 ~llx2*'-l] 










2" - 1 


2k 


1 


[llx2'=-l 1 ~llx2'=-l] 


2fc+i _ 1 


2" 


1 


llx2'= 



Table 1: Eigenvalues and eigenvectors for H^H„ and Y^Y^. 



Proof. We will prove it by induction on m. When A; = 1, 

H2H2 — 



2 1 
1 2 



whose eigenvalues are 3 and 1 with eigenvector [1,1] and [1,-1] respectively. Suppose the conclusion is right for Hj/s-i. 
Notice the fact that 



Hjfc = 



llx2'=-l llx2'=-l 

Hjfc-i 

H2)e-1 



where laxb is the a x 6 matrix whose entries are all 1. Therefore we have 



HjfcHjfc 



l2fc-lxl H^fc- 
Ijfc-lxl 



H 



2k- 1 



l2'=xlllx2'= + 







llx2''-l llx2''-l 

H2fe-i 
H2fe-i 



H2fc-i 
Hofc-i 



l2'= x2'= + 



H2)s-iH2f!-l 







H2ji_iH2fe-l 



"2*^- 

Notice the case that l]^x2'^-i is a eigenvector of H*fc_iH2fc-i with eigenvalue 2* — 1, we have: 



H2fcH2fc 1]^X2'= — -^2^= X2'= -^2^ X 1 

= 12*= X2'' XI + 



H2(i_iH2fe-i 

H2fe_iH2fc-i 

H2fc_iH2):-ll2fe-lxl 



12*^X1 





H2fc_iH2fe-ll2fe-lxl 
2''l2fcxi + (S*" - l)l2'=xl = (2™^^ - l)l2'=xl 



2fc ■0-2'*^ 



'■2'=-lxl 
l2''-l xl 



+ 



H2fc_iH2fe-l l2fc-l XI 



H'fc_iH2fc-i 

H2fc-iH2fc-l 









= (2*= - 1) 







-lofc-l , 



— H2fe_iH2fc-l l2fc- 



Moreover, for any eigenvector v of H2fc_iH2fc-i in Table 1 other than Iix2*:-i5 notice the fact that the sum of all entries in 
V is 0, denote the eigenvalue of v as Hv, 



V 




V 




02fc-lxl 


— 12*^x2^ 


02fe-lxl 


+ 



H2fc_iH2(s-i 







Hjfe-iHjfc-i 



02k- 



12^-1-^2*' 





iv ■ 




V 





= M« 


_ 02'»-lxl 



Therefore [v,0] is a eigenvector of 'H.^kH2'' with eigenvalue and we can show [0,v] is a eigenvector of H'fcH2fc with 
eigenvalue pi„ by a similar process. Above all, we proved that the vectors in Table 1 are all eigenvectors of H2fcH2A: with 
eigenvalues shown in the Table. Moreover, since any pair of eigenvectors from Table 1 are orthogonal to each other and there 
are 2*^ vectors in the Table, we know the table contains all eigenvalues of H't H2fc . 

Similar as the case of H2jiH2jc , the eigenvectors and eigenvalues of Yjit Y2JC can be proved by induction on n as well. When 
fc = 1, 



Y*,Y2 



2 

2 



whose eigenvector [1, 1] and [1, —1] and both of them have eigenvalue 2. Suppose Table 1 gives eigenvalues and eigenvectors 
of Y2)s-iY2fe-i. Notice the fact that 



Yofc 



■2fe 



"llx2fe- 





llx2'=-l 



Yjic-i 



11x2*= 





'-2'= X2'« 



Therefore, 



Yofc Y. 



2fc I 2*= 



Y2fc-i — l2fc-lxl 

l2fc-lxl Y2fe_i 



l2fc-lx2*-l 

l2fe-lx2*=-l 



Y2fc-1 
"11x2*=- 



Y*fc_i Y2 



llx2*:-l 



Y2.-1 







Yjfc-i Y2fe-i 



The rest of proof follows a similar process of the proof of correctness of eigenvalues and eigenvecotrs of H* ^ Hjji . □ 



Let be the diagonal matrix whose entries are square roots of eigenvalues in Table 1 and Ph be the matrix whose 

row vectors are the normalization of eigenvectors in Table 1. Then Dh^Ph is an eigendecomposition of H^H„. Now let us 
compute the Li norm of i-th column of DhsPh- Notice the fact that for each eigenvalue 2^ — 1, 1 < j < k, there exists 

exactly one eigenvector Vj in Tabic 1 which is corresponding to this eigenvalue and has non-zero i-th entry. Moreover, since 

there are 2-' entries in Vj that are ±1 and all other entries in Vj are 0, the nonzero entries in the normalization of Vj are ±2~5 . 
Since the entries in the normalized eigenvector that correspond to eigenvalue 2*^+^ — 1 are ±2~ 2 ^ the Li norm of i-th column 
is: 

x: 2 - i + 2 - 1 v2^Tr^ = ^ + 

3 = 1 j = l 

Therefore the sensitivity of DhsPh is J2^=i ~ ^ + ^2 — Consider function f[k) = + 1 — — ^ + -^2 — 
easy computation will show that 

/(fc + 1) - /(fc) = 1 + ^2^^ - yiT-ji - yaT^j: > 

for any positive integer k. Since /(I) w 0.06815 > 0, we know /(fe) > 0, which means the sensitivity of DhsPh is always 
smaller than the sensitivity of H„. 

Notice the fact that both lix2'= and [lix2'«-i> "11x2*^-1] are eigenvectors of Y*Y„ corresponding to eigenvalue 2*°, we 
know [11x2*^-11 01x2*=-!] and [0ix2'=-i j 11X2*!-!] are also eigenvectors of Y* Y„ corresponding to eigenvalue 2*°. Let Dvg be 
the diagonal matrix whose entries are square roots of eigenvalues in Table 1 and Py be the matrix whose first n — 2 row 
vectors are normalization of eigenvectors in Table 1 and last two row vccotrs arc normalization of [lix2'«-i > 0ix2'»-i] and 
[01x2*^-11 lix2'=-i]- Then Dy^Py is an eigendecomposition of Y^Yn- Similar as DhsPh, the Li norm of i-th column of 
DysPy is: 

fc-i 

^2"2^ -K2"5V2*=-i =k + y/2-l. 
j=i 

Therefore the sensitivity of Dy^Py is A; + \/2 — 1. 

F.2 Error analysis 

Based on the eigen-decomposition in the previous section. We now can formally analyze the error of H„ and Y„ . 
Theorem 11. For any linear counting query w, 

^Errory(w) < Errorh(w) < 2Errory(w). 

Proof. Recall that the error for any given query w and strategy A is ^A^w*(A'A)~'^w. Since H„ and Y„ have the 
same sensitivity, wo need only compare the profile term w*(A*A)~'^w. Let PmDmPm be the spectral decomposition of 

(A*A)"^ Notice that: 

trace(w*(A*A)'^w) =trace(w*PMDMPMw) 
=trace(PMWW*PMDM). 

Since H„ and Y„ have the same eigenvectors, the only difference in error is due to the difference in eigenvalues. From Table 1 
we know ratio between their corresponding eigenvalues is in range [|, 2], and that all eigenvalues are positive. Therefore, the 
ratio between their errors of answering w is in [|, 2]. □ 

Theorem 12 (Maximum and Total Error). The maximum and total error on workloads Wh and Wqi using strate- 
gies H„, Y„, and I„ is given by: 



MaxError 


H,i 


Y„ 


In 


w« 

Woi 




e(log^ n/e^) 
e(nlog2 n/e^) 


e(log^ n/e^) 


e(n/e^) 


TotalError 




H„ 


Y„ 


In 


Wol 


e(n^ log^ n/e^) 
e(n2"log2n/e2) 


e(n^ log^ n/e^) 
e(n2" log2 n/e2 


e(n2"/e2) 



Proof. Since W„ and H„ are asymptotically equivalent, we can derive the error bounds for either. We analyze the error 
of W„. Let n = 2*=, consider the range query [2*= - |(4'-t^J+i - 1), 2*= + | (4^^^-!+^ - 1)]. The error of this query is e(log^ n), 



Figure 5: For the error profiles Mi and M2 described in Example 8, this figure shows the ellipses defined 
by wMiw* — 1, a. circle, and wM2w' — 1, an ellipse rotated 45°. The profile term / = wMiw is an elliptic 
paraboloid coming out of the page, centered around the z axis. 



which follows from algebraic manipulation of Equation 1, facilitated by knowing the eigen decomposition of (W„*W,i) 
Since Xiao et al. [17] have already shown that the worst case error of W,i is O(log^n), we know the maximum error of 
answering any query in W_b is 0(log"^ n). 

Moreover, it follows from algebraic manipulation that the error of answering any query w where the number of non-zero 
entries is 1 is 0(log^ n). Therefore the error of any 0-1 query is 0(n log'^ n). Consider the query (0, 1, 0, 1, . . . , 0, 1): it can can 
be shown to have error 0(nlog^ n). Therefore the maximum error of answering any query in Wqi is 0(nlog'^ n). 

Recall 

TotalErrora(W) = 4AAtrace(W(A'A)"^W*). 
Total error of workloads W_r, Wqi can be computed by applying the equation above to strategies H„,W„ and I„. □ 



G. THE GEOMETRY OF A STRATEGY 

Finding the optimal strategy for a given workload will require considering both the shape of profile and the sensitivity, as 
discussed in Section 5. We use an example to demonstrate the geometry of the error profile in Sec. G.l. In Sec. G.2, we look 
at the geometry of sensitivity and how the Qa matrix of the decomposition of A affects the sensitivity of A. 

G.l The geometry of the error profile 

As discussed in Sec. 4, for any strategy A, the error profile M — (A* A) ^ is a positive definite matrix, that is, one for 
which wMw* > whenever w 7^ 0. The following example gives the geometry of two error profiles. 

Example 8. Figure 5 shows the ellipses corresponding to two error profiles Mi = [J 1] and M2 = [_i 5 2^]- Each point 
in the x-y plane corresponds to a query w = [ci, C2]. Those points on each ellipse correspond to queries such that wMw' = 1. 
Ml is a circle whereas M2 is a stretched and rotated ellipse. The figure shows that wM2w' < wMiw* for queries near and 
along the line y — x. 

The profile Mi has eigenvalues (1,1) and eigenvectors Pm ~ [q J], indicating no stretching or rotation. The profile M2 
has eigenvalues (7/2, 1/2) and its eigenvectors correspond to a 45° rotation, indicating that the major axis is stretched (by a 
\/l ratio to the minor axis) and rotated to align with y — x. 

The decomposition can also guide the design of new strategies. We can design the error profile by choosing values for 
the diagonal, and choosing a rotation. Stretching and rotating in a direction makes queries in that direction relatively more 
accurate than queries in the other directions. 

G.2 The geometry of sensitivity 

While we can design a strategy to obtain a desired profile, the error depends not only on the profile, but also on the 
sensitivity. For example, we can apply Theorem 2 to the previous example. 

Example 9. Applying Theorem 2, we can obtain query strategies Ai and A2 that achieve profiles Mi and M2 respectively. 



Al = [1 0] and A2 = [ v2/7 ir'=°»(-/4) -""(y*)] = -i/v^i 

^ LO IJ ^*^('^/'*) cos(7r/4) J L i i J 



As it is shown above, A2 has higher sensitivity than Ai, so while it is more accurate for queries along y = x, the difference 
is less pronounced than Figure 5 might suggest. 



The sensitivity of a query strategy is determined by its columns. If A is decomposed as A = QaDaPa then the columns 
of Pa are orthogonal vectors, but Da stretches the axes so that the columns are no longer necessarily orthogonal. The matrix 
Qa then rotates the column vectors of DaPai but we know that any such rotation will not impact the error profile. The 
rotation does impact the sensitivity, because each rotation changes the column vectors and therefore changes the maximum 
absolute sum of the column vectors. Since sensitivity is measured by the Li norm of the column vectors, we can think of an Li 
"ball" (it is actually diamond shaped) which consists of all points with Li norm equal to a constant c. If we view the column 
vectors of A as points in n dimensional space, the sensitivity is the smallest L\ ball that contains the points. Minimizing the 
sensitivity of a given profile (Problem 2) is therefore equivalent to finding the rotation of the columns in A that permits them 
to be contained in the smallest Li ball. 



